Apprentissage automatique d'un chunker pour le français (Machine Learning of a chunker for French) [in French]

نویسندگان

  • Isabelle Tellier
  • Denys Duchier
  • Iris Eshkol-Taravella
  • Arnaud Courmet
  • Mathieu Martinet
چکیده

Machine Learning of a chunker for French We describe in this paper how to automatically learn a chunker for French, from the French Tree Bank and CRFs (Conditional Random Fields). We did several experiments, either to recognize every possible kind of chunks, or to focus on simple nominal phrases only. We evaluate the obtained chunker on internal data (i.e. also extracted from the French Tree Bank) as well as on external (i.e from a distinct corpus) ones, to measure its robustness. MOTS-CLÉS : chunking, apprentissage automatique, French Tree Bank, CRF.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Un segmenteur-étiqueteur et un chunker pour le français (A Segmenter-POS Labeller and a Chunker for French) [in French]

A Segmenter-POS Labeller and a Chunker for French We propose a demo of two softwares : a Segmenter-POS Labeller for French and a Chunker for texts treated by the first program. Both have been learned from the French Tree Bank. MOTS-CLÉS : étiquetage POS, chunking, apprentissage automatique, French Tree Bank, CRF.

متن کامل

Building and exploiting a French corpus for sentiment analysis (Construction et exploitation d'un corpus français pour l'analyse de sentiment) [in French]

Building and exploiting a French corpus for sentiment analysis This work introduces a French corpus for sentiment analysis. We describe the construction and organization of the corpus. We then apply machine learning techniques to automatically predict whether a text is positive or negative (the opinion classification task). Two techniques are used : logistic regression and classification based ...

متن کامل

Can we chunk well with bad POS labels? (Peut-on bien chunker avec de mauvaises étiquettes POS ?) [in French]

In this paper, we test two distinct approaches to chunk transcribed oral data, trying to minimize the phases of manual correction. First, we use an existing chunker, learned from written texts, then we try to learn a new specific chunker from a small amount of manually corrected labeled oral data. The purpose is to reach the best possible results for the chunker with as few manual corrections o...

متن کامل

A Named Entity recognizer for French (Un reconnaisseur d'entités nommées du Français) [in French]

We propose to demonstrate a french named entity recognizer trained on the French TreeBank enriched with named entity annotations. Mots-clés : REN, POS, apprentissage automatique, French Treebank, extraction d’information, CRF.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012